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Letter from the Editor 


‘This valume presents fresh solutions to two perennial challenges for isttutional researchers: 
dealing with declining survey responce rates and finding meaning in copious amounts of text 


ata, 


In their article An Alternative Approach: Using Panels to Survey College Students, Sarra, Hurtado, 
Houlemarde, and Wang describe an experiment to compare results rom the standard NSSE administration process with 
‘one using much smaller samples and multiple short surveys. Do you belleve response rates, scale reliabilities, and factor 
“structures can hold up with panel methods? Read thelr research to find outfit an work for you. 


Ifyou've ever neglected open-ended survey comments simply because you didn't know what to do with ll that tet, 
Zvinsks and Michalski providea lifeline. Their article, Mining Text Data: Making Sense of What Students Tell Us, walks 
usthrough how to extract information from text data and identify software to assist inthe process. Ther examples 
iustrate text mining with different types of writen artifacts and may inspire you to tackle your text data with renewed 


confidence. 


Do you have a solution toa vexing If challenge? Please share it with AIR Professional Filet 


Sincerely, 


Shatran L Rance 
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MINING TEXT DATA: MAKING SENSE OF 
WHAT STUDENTS TELL US 


John Zilvinskis 
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Abstract 
Text mining presents an efficient means 
twaccess the comprehensive amount 
‘of at found in writen record by 
converting word into numbers and 
‘using algorithms to detect relevant 
pattems Thisatce presents the 
fundamentals of text mining, incuding 
anoverview of key concepts, prevalent 
‘methavdolagies inthis work, and 
popular software packages The 
alt of tect mining is demonstrated 
‘through descrition of two promising 
practices and presentation oft 
detailed examples The two promising 
practices are (1) using text analytics 
toundestand and minimize course 
\witharawals, and (2) assessing student 
understanding and depth of eaming 
Inscience, technology, engineering 
and mathematis (STEMI (physics). The 
‘we detalled examples ate (1) reining 
survey tems onthe National Survey of 
Student Engagement (NSSE) and (2) 


Using texto crete learning analyses 
_ystem ata community collage (City 
University of New York CUNY: the 
Stella and Charles Guttman Community 
College, or CUNY Guttman). Results 
ofthis study include identification 

of additional item choices forthe 
survey and discovery of elationship 
between e-portiolo content and 
scademic performance. Additional 
‘examples of text mining inbigher 
‘education and ethical considerations 
Pertaining to this technology arealso 
sdecuseed. 


FRAMING THE ISSUE 
OF TEXT MINING 


Students generate copious amounts of 
thick rch data; however, these data ae 
often unexamined because traditional 
‘ualtative methodologies used to 
‘examine thousands of submissions 
requite extensive resources. Test 
‘mining (the machine coding of text 
with the goal of integrating converted 
submisions with quantitative 
methods) offers timely, accurate, and 
ctionable assistance Zhang & Segal, 
2010) Thisartce presents detalles 
Information on how text mining can be 
sed by staff wha worn institutional 
research i and collect, but often are 
forced to neglect, text-based data, 


‘amples of teat data accessibletoR 

saffare 

+ Applieation essays, 

+ Witten assignments, 

+ Open-ended survey responses, 

+ Course Management Software 
(CM) postings. 

+ Student blogs, 

+ Course evaluations, 

+ Surveys and 

+ Eportolis. 


IR professionals ecagnize the depth 
cof text data that are—or potentially 
‘can be—collecte, but might be 
Uncertain how to proces those data 
and use them for campus esearch 
Informing decision-making. This article 
‘presents an overview ofthe cancepts 
and strategies of text mining and taxt 
analytics and acquaint thereader 
‘with the terminology, methadlogy, 
and sofware azociated with this 
technology. Two examples using tect 
datainhigher education ilustate 
‘how mining can be arid aut ts 
hoped thatthe reader will develop a 
fundamental understanding oftext, 
‘mining, be able to suggest haw itcan 
‘be used told decision-makers, and 
understand the advantages as well 
‘theconstaints ofthis approach to data 
analysis 
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BASIC CONCEPTS 


Before describing how the researcher 
would approach text data ti 
Important distinguish between 
data mining information and analytes 
Datamining oftenimplies thatthe 
‘researchers employing algorithms to 
explore lage data sat (Baker & Yacet, 
209; Fayya,Patteky Shapira & 
‘smyth, 1996) Analytics incorporates 
data mining techniques ta create 
‘actonableinteligence meaning 
Information that guides decson- 
‘making (Campbel, Delos, & Oblinger, 
2007, p42} This information is 

‘sed to predict case level behavior 
‘that wll guide intervention van 
Barneveld, Amod,& Campbell, 2012) 
In education, learning analytics takes 
‘theform of collecting real-time datata 
measure the effectiveness af teaching 
practices fora particular student, and 
to suggestinterventionn relative 
‘realtime inthe cate that they ate oot 
elective (Suthers 8 Verbert,2013).As 
‘wll be described Iter the timing of 
when dita ae collected processed, 
limplemented and acted oni eritcal 
rem moving data mining learning 
analytes (Amold& Pistil, 2012), 

Tris article uses similar definitions 
when describing txt mining and text 
analytes. 


varity of elated definitions for 
‘text mining canbe fund inthe 
‘erature The fllvaing dition 
isan adaptation of data mining that 
‘emphasizes the typeof data being 
‘mined"the cecovery of useful 

and previously unknown gems‘of 
Information from textual dacument 
repostories'(Zhang & Segal 2010, 
'.626| Other defintions involve 
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the use of specifi methodological 
technological approaches:"Text mining 
Isa young interdscplinary fel which 
‘rows on information retrieval data 
‘mining, machin learning, statistics 
and computational linguists (Singh & 
Raghuvansh 2012. p. 129). 


“The rational for text mining stems 
from’the need totum teat into 
umbersso povertulalgorthms 

‘can be apalied ta large document 
dtabases"(Miner etal, 2012.0, 

30), According to Miner etal, txt 
‘mining and text analytes ae broad 
Uumballa tems describing a range 
of technologies for analyzing and 
processing semi-structured and 
unstructured text data Text mining 
Isthe practical application of many 
techniques of analytial processing in 
textanayties 


“There are several ways that researchers 
can use software to acces, abe, 

and study text data Familarto 
anyone who has used an internet 
search engine, information accss 

uses recall systems comman in most 
text-mining approaches: hawever, 

the absence of generating new 
Information excludes this practice 
from wue txt mining status (Hearst, 
1999) Another step toward text, 
‘mining incluces the categorization of 
text (such as eategoriang libraries or 
eademicjournals) that can be—in 
Itsownnght—a means to mine text 
Siar techniques canbe usd in the 
processes of clustering documents and 
‘ining Web cantnt. yond pling 
specific wards ar curtering proximity 
of terms, eseachers als intendto 
extract meaning from the teat they 
study. Thase who sty computational 


lingulses contribute to txt mining 
by developing algorithms (a subfield 
<alled natural language processing) 
‘tomeasure sentiment and meaning 
‘rom ext. Educational data mining can 
ula text in a unique way to measure 
student laring afimportant enocepts 
‘orrefecting on developmental 
milestones (Baker Yacet, 2000, 


Researchers familar with qualtatve 
‘ezeatch might be skeptical about the 
effort of hose wha worn natural 
language processing and wander, 
“Why woulda’ the researchersimply 
perform rational qualitative data 
mmethade with text data” Despite the 
development of intaligent graders and 
machine lemming, camputer programs 
donot have the capacity to interpret 
‘the ance of writing atthe evel ofthe 
‘human brain. Nonetheless thereare 
several reasons why text mining 63 
ale research technique 


Fis data sets can include thousands, 
milion, or bllons of text submissions, 
precluding the use af tration 
‘qualitative techniques Second, event 
the tet data wore at sie that could 
be reviewed by researchers hiingand 
‘waning coders inreass the resources 
(time and money} needed to complete 
this process, Farmast departments, 
‘ing team of coders might at be 
practical. When anaes projects are 
‘sed to identiy stunts wna need 
suppoct the resulting infrmtion 

‘an cometoo late to provide that 
Supporti data coding, processing and 
interpreting takes oo long Having 
coders inteduces the need ta account 
forinter rater relay Third these data 
are computed into numeric valuesso 
‘hat they can be combined with other 


sources of data in statistical models bul 
tw predict student behavior Text mining 
lows forth sable and efficent 
processing of text data, 


THE PROCESSES OF 
MINING TEXT 


In their book Practical ert Mining and 
‘Storia Analysis tor Non Structured 
Text Data Applications, Miner ea. 
(2012) presenta comprehensive 
‘step-by-step guide for text mining. 
Their text- mining methodology i very 
similar t other types of esearch the 
researchers ta define the purpose 
ofthe project manage the data (seek, 
‘organi, clea, and extract model 
data, evaluate the results and finally, 
sceminat the results. The aspects 
‘of data management organizing, 
‘leaning. and extracting data are 
particular to text mining 


Fist the researcher generates corpus, 
‘ora callecton of dacuments or cases 
Inwhich the desired tect exit. Ouring 
‘this phase the researcher removes 
allnanessentil information, such 
‘ase-mall addresses orWeb inks 
‘Second the researcher either usesan 
establshed"stop orinclude wordlist 
lorcreates one that filters aut words 
‘that lack infrational value (the, a, his) 
while highlighting words that do have 
‘value. This phase also includes limiting 
‘thenumber of temsby accounting 
forinfection plural vs. singular, past 
‘5 present) othe word reat (teach 
for teaching, teacher, teaches) Th 
‘the researcher organizes the terms 
thin cases Tis process can be done 
‘by using a simple binary notation 

(1 terms presen) orby nating 
semifrequent but unique tem (2, 


terms that re featured in some, but 
otal ofthe cases. Anather way 0 
organize the terms isby using singular 
‘value decamaostion SVD), which 
reduces the input matic to asmaler 
‘version, representing the variability 
ofeach case Berry & Kogan, 2010; 
Manning & Schutz, 1990) This later 
method isimportant when working 
with ange text dat sets that otherwise 
‘could tae a Tong time to proces 


Once the researcher has organized 
the data, she can use several waysto 
extract important information from 
these cases (Miner et al, 2012 

+ Clasifcation The researcher 
‘peates a dctionary to organize 
tems based on theirdefntion 
and hierarchical connection for 
example she might fecasss" 
lander te domain of"education 

+ Clustering. The researcher groups 
terms based on the equencyand 
pattem of theiruse, compared 10 
therumberof students whe use 
thove terms. 

+ Association. The researcher 
examines the use of text in 
‘connection wth some event that 
‘ecuring For example, she might 
‘ant to compare the postive or 
negative descriptions of facuty 
teaching, as reflected in course 
evaluation before and afterfral 
‘grades are posted 

+ Trend analysis The researcher 
‘measures the change of text 
responses to an identical prompt 
overtime 


SOFTWARE TO 
CONSIDER 


Numerous fee and commercial 
software programs ae avalable for 
‘text mining. RapidMiner isan open 
source analytics program featuring 
tools that include the analysis of text 
data, The Visual point and-cicknature 
ofthe interface allows non-computer 
programmers to access, clean, and 
analyze hel data, The Rapid Miner Wes 
site Includes numerous esoures, and 
the user community has posted helpful 
Videos on YouTuba. The text extensons 
include e35yt0-use functions such as 
‘theabiliy to group documents based 
‘on term frequency Although the base- 
love packages tree, users might want 
to purchase the professional package 
that includes more-advanced options. 
RapidMiner provides sizable discounts 
to researchers who use thelr software, 
and also provides an extension that 
allows forthe export of datainto 
Tableau Thisie an deal program for 
professionals who want to begin 
‘experimenting with ext data 


\iikata Environment for Knowledge 
‘Analysis WEKA)s another open source 
‘rogram avaiable for download, WEKA 
uses 2ystem of algorithms through 
ava to perform analytic functions: 
eyphrase Extraction Algorithm (EA), 
‘on theather hand, san extension of 
that project focused on text mining. 
‘Both WEKA and NEA were developed 
atthe University of Walkato, New 
Zealand, WEXA sa premier program 
formachinelesning a fel of 
‘computer science that emphasises the 
{development of programs that can 
recognize patterns and self evolve, 
Which ie elevantin text mining 
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where programs recognize and adapt 
tochangesin text KEAhasa clean 
function of recognizing text withina 
corpus. However the program does 
‘ot havea graphical user interface and 
‘therefore smart accesible to those 
With computer science backgrounds 
‘Also, although the WEKAWeb ste 
continues to calet publications fom 
‘researchers using ther system, as of 
this writing the XEA Web ste has not 
‘been updated forsome time and has 
limited resources. 


-MostIR professionals are familar with 
fae program, but they might not be 
‘uentin the programming language 
‘needed ta make it work, Risa premier 
Statseal software n pat because it 
is ee butalsa becauseit has been 
adopted bya large contingent of 
{taistielans world There are 
‘numerous books and articles an how 
touse Rar text mining: however, 
‘beta the te it would take stat 
‘tobecome expertin the use of Rand 
to teach themsslveshow ta program 
‘the text mining the oftware can lose 
Itsadvantage Nonetheless, for staff 
{amir with the programy tean bea 
‘reat platform to begin to expote and 
experiment with tet data 


The commercial products do 
lstinguish themselves fom the fee 
software. IM SPSS Modeler Premium, 
‘offers text mining in adtion tothe 
analytes sult; the suite includes other 
data analysis processes including 
‘arabe selection, various analysis 
plications and vualaation options 


Luke most SPSS platforms, the interface 
Isertremely user fiendly. prcbably 

1.3 fault considering haw statistical 
analysisis reduced toa few button 
‘licks. However forthe purposes of 
text mining the software is quite 
advanced. The text function har a bult- 
In cletionary that lasses tems and 
allows the user to cate a dictionary. 
“This function creates hierarchy of 
similar terms and places them ints 
broad categories such as‘athletis” 
"emotion’ and “education” The 
pplication also has bult-in dictionaries 
that clude a cacsification scheme 

far terms in areas suchas customer 
satiefaction The natural oting of 
tenmsalong withthe ease of cating 

a dlitonary allows farcomprehensive 
and ear dassfication of text data 


SAS" Enterprise Miner software 
leo has a user interface and offers 
‘comprehensive tool fr organizing 
textand clustering cases Like 6M 
SPSS Modeler Premium, SAS Enterpice 
Miner software ran analytics suite 
that includes some text-mining 
applications However, the way tis 
product distinguishes itself from 
other thatthe clustering function 
oftrms ica component ofthe text- 
mining function, Often programe wil 
reduce terms to dichotomous values 
(not-present present) and then 
‘employ clustering methods afterward 
SAS Enterprise Miner software allows 
‘the user to augment the clustering 
parameters within the text mining 
function and crates data visualizations 
that are more compelling because of 


ther use of text-mining terms These 
features wthinSAS Enterprise Miner 
software afera comprehensive way to 
cluster terms for analysis. 


PROMISING PRACTICES 


In higher education there are numerous 
‘opportunities to mine ext datato 
‘predict important student behavior. 
For example application essayscan be 
mined te predict student matriculation 
‘or even retention. These data can be 
incorporated nto an analytics project 
aimed at awarding the appropiate 
amount of financial ald needed 19 
secure a student’ enrollment. Another 
\wayin which text mining could be wed 
Would be to measure virtual tent 
participation in course management 
software, suchas evaluating students 
‘contribution ina course message 
‘board within leaming management 
system ke Backboard or Canvas. 
‘Aeady researchers are using datato 
model and predict students academe 
performance and assign intervention, 
such as e-mail natifction, 
‘conversations with advo, and alert 
tw faculty (Amold& Pst, 2012). These 
efforts can be enhanced with the 
analysis of teat data. 


Using Text Analytics to 
Understand and Minimize 
‘Course Withdrawals 

Two curentand promising 
anpleationsoftext analy involve 
its application inthe areas of student 
‘course retention and course outcomes, 
In mining open-ended comments 
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‘captured from students withdrawing 
from college cours Michal (2014) 
‘produced and tested a model that 
succeeded in categorizing over 95% 
‘of student explanations fo student 
‘course withdrawal decisions. Tis 
‘model includes 11 major categories for 
‘course withdrawal, reasons) and 
‘corresponds wel exsting theoretical 
and empirical esearch inthe area 

of college course withdrawal. Of the 

1 categories thetop three coding 
‘categories were (1 time-schedule, 

(2) job-wrk, and (3) persona-other 
reasons provided by students, Other 
‘categories included finances, health, 
family, course/faculty negative, and 
‘online course {mentioned by students 
‘who stated thelr deste to take the 
‘ame course it instructor-led ather 
‘than online, delivery Subsequent 
‘esearch Michalsi, 2015) further 
‘demonstrates how eutput from the 
‘resulting tet model can be statistically 
analyzed using selected quanttatve 
procedures (including Hierarchical 
‘Agalomerative Custer Analysis, 
Principal Components Analysis, and 
Multiple Correspondence Analysis) 
for validation 2, creating clusters of 
‘terms describing couse withdrawal 
‘that are bath mathematically and 
‘conceptually related. These results 
are currently being used ta develop 
‘course Re-envollment Assessment 
nine (REASON) process to minimize 
‘course withdrawals, in party 
‘Identifying and providing appropriate 
support services advising and other 
assistance to encourage and assist 
“student course enrolment decisions 


Assessing Student 
Understanding and Depth of 
Learning in STEM 

second promising example ofthe 
application of text mining the 
analysis of tudent responses ta 
‘open-ended assesment questions 

at Michigan State University's 

(2015) Collaborative Researchin 
Education, Assessment and Teaching 
Envlonments forthe fields ot 

scence, technology, engineering 

and mathematics (CREATE for STEM). 
“Thee, eseachers analyzed student 
responses open-ended test 
‘question and elated these answers 
te course outcomes Within this 
Projectat Michigan State, as part of 
‘National Science Foundation gran 
Park, Haudek, and Urban-Lrain 
(2015) used BM SPSS Modeler to 
mine the text of short answer physi 
test questions about thecaurse 
topi"energy! The purnore of this 
study wast0 explore the degree 10 
hich txm use isassciated with 
overall knowledge of energy elated 
constructs. The researchers were able 
to dass terms used in open-ended 
responsesas ether surface level 
understanding or scientific ideas. Not 
surprisingly, students who vote using 
scientific ideas were mare likely to 
answer corresponding multiple-choice 
‘questions coectly Innovative uses 
of text mining Ike these allow fora 
‘more robust understanding of student 
learning, and assist in test design, 


TWO DETAILED 
EXAMPLES 


‘This artile no demonstrates 
‘the uty of text mining through 
-prezentation of two detaled examples: 
(0) refring survey items onthe NSS, 
and (2) lating -poctflio text to 
student performanceat a community 
callege (CUNY Guttman) 


Example One: Classifying Open- 
Ended Survey Questions 

Wien cresting survey, rezearchers 
sttve to develop closed-ended items 
‘that present all of the posible answers 
fora given question (Diliman, Chistian, 
{Smyth 2014), Answering open- 
ended questions requires more mental 
‘energy af the respondent and an ead 
toanswers that are more efit than 
‘loved: ended tems forthe researcher 
toprocers. ican be challenging for 
inettutional esearchers to develop 
lstof athe posible survey responses 
toa particular question What do 
researchers dof theyare not suet 
allposibe responses are presented? 
‘Asimple answer sto createa text 

‘box next toanether" option where a 
‘respondent could writen an answer 
Using text mining, the researcher can 
‘organize these witensto create a 
more comprehensive lito options. 


Inthe20%4 adminetration ofthe 
SSE, an experimental item set 

\was developed to identify types of 
leadership positions held by student 
Inthe development of his survey item 
st reseatchers wanted to know how 
‘often students cl they hac held a 
formal leadership position inthe core 
survey, as compared tohow fen 
they identified cervingina specie 
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leadership rolelaterin the survey. 
Df the students wha responded t0 
‘theitems, 1482 of 4836 (3189) wrote 
inaleadership postion not listed 

inthe core survey item. Text mining 
was used to determine) whatnew 
leadership roles shouldbe added tthe 
survey setand 2 the degre to which 
‘responses ta the leadership tem nthe 
core survey related to answersin the 
‘experimental item st, 

Vinten etres were caseied using 
the text-mining capability of 1M 

‘PSS Modeler Premium. Of the 1.482 
students who entered response, 630 
(6608) entries were summarized into 
eight eategores For example, he 
concept teaching assistant” Included 
‘variations in tem, such as"teaching 
assistant teachering assstant"and 
“weachersasstant” The top eight 
leadership postions written nto the 
‘open-ended textbox and their percent 
‘of total vnitten comments ae shown in 
Tle 


‘The essrchers then calculated how 
welthere positon represented he 
‘respondent understanding of a 

“formal leadership postion” inthe 

core survey respondents were asked 
they had completed forwerein 

‘the process of completing) feral 
leadership positon. comparing the 
‘umber of students wha identified one 
ofthese postions tthe numberof 
students wha sid they had participated 
ina formal eadership postion the 
researcheshad a better understanding 
‘of postions that constitute formal 
leadership" trom the perspective of the 
‘respondents For example students who 
wrote in'secretaies”"Weasurers and 
“editor? were moe ely ta have sid 
‘they had completed leadership role. 
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‘Table 1. Counts and Percentage of Writes of Additional Leadership Positions 


‘Added Through Text Mining 


Turing 45 98% 
Teaching asitant a Ea 
‘Research asstant 0 40% 
Sean = am 
“Treasurer sr 38% 
Mentor is Eas 
Member 3 3% 
tor 5 ad 


‘Asa result ofthis study, the response 
options nstructoror Teaching 
Assistant’ “Tutor and"Edtor were 
ded as leadership postions for 

the second administration of thie 
experimental item set Researchers 
considered installing skip logic that 
would allow only affirmative responses 
tothe formal leadership tem onthe 
core survey to acces the experimental 
Item set However, ince a lange number 
of respondents wha held positions 
(both the original and writen options) 
had not reported completing a formal 
leadership experience the cre 
survey the skp logic as not Included. 
Intheinetance, text mining allowed 
the researchers to expand the item 
bbankand improve the survey design. 


Example Two: Clustering 
E-Portfolio Submissions 
Aspartof aBlll& Melinda Gates, 
Foundatlon-funded projec, 
researchers wth CUNY Guttman 
wer interested n mining frst. year 
e-portiolo introductions to see 
ifstudent text data could pret 
academic performance llstudents 


were required to attend a summer 
‘bridge program prior to their ist 
semester. During that program, 
students began theire-portfoioby 
‘wing introductions of themselves 
‘Some ubisslons wer infra 
reading ike an onine“shout out” 
instead ofan academic pece of work 
(Other submissions deseribed ambitions 
and backgrounds 


Using SAS Enterprise Mines terms 
from 163 student e-potfallo 
introductions ware clustered to 
predict student outcomes suchas the 
numberof credit hours esmed and 
(GPA The terms were grouped using a 
‘cluster analysis within the Enterprise 
Miner suite; groups of terms were 
‘lven names by researchers, based 

con thelr seemingly shared concept. 
‘These concepts, which aligned 

ith student development theory, 
‘emerged from the terms; for example, 
‘Worrying about making friend 9. 
‘hy, person friend, know que) and 
‘commitment ta saciety (£0, 
‘worker work, believe, help These 
results ate summarize in Table 2. 


“Table 2. Results from Text Clustering of Student E-PortflioIneductions 


Concept Clustered Terms 


‘Connection a fam fay, yrk high sooo, college, ch 
Teaming ass teacher a math subj 

Everday ‘now, day, ove, te 

allege paipaton [igh school schoo alan gutnan 

‘Gaming ‘ame, movi avert, watch. vdeo 

Wry abou making Wanda shy parson, nd, kaw, at 

Recreation a basketball lay sport, ave 

‘Conmirentio sty soci worker work bale, hap 

“Technology niepraton Technolgy, formation, a heath, mod 

‘Aspirations fo work a Business [astiman, bisiness, manhatan, administration, graduate 


Note" CUNY Gutman is ocated fv Manhattan, sk might be common far students whe ar desenbing tel cannectian to family and 
‘education fo nid ter cannacton to New Yak Cy. 


Variables were coded dichotomously 
"uated on whether he term var 
contained ina concepts cluster 

nan ordinary least squares (015) 
‘regression analysis, after controlling 
forcollege preparation (SAT and 
wing proficiency scares) and age, 
_afelationship (p = 006) was found 
‘between the concept connection 
‘otal and cedithous eared 
‘that fall Stent who used tems 
Within his concept eamed fewer 
‘rect hours than students who cid 
‘ot Besides identifying the topics 
‘that students write about in thelr 

€ porfaliintreductons the results of 
this research identified one cancept— 
connection to family—that predicted 
‘the number af credit hours eames 
‘this case, students who wrote about 
‘connection to family earned fewer 
‘rect hours than students whe aid 
‘not This finding s consistent with 
‘the Theory of Student Departure, 

In hich Tie (1987) argues that 


family obligstionscan ead tastudent _tokdentiy themes within these 
ation. The ultimate goal of his ‘documents. Although clustering terms 
project was to explore whether text ‘equires mare mathematical taining 
‘mining could be used in learning and the naming of the concepts 
analytes at CUNY Guttman Alimitation subjective (as tis in factr analysis, 
ofthis study is tha ths institution ‘the mathematical grouping of terms 
hasasmall enrolment, so many of provides insight into what students are 
the principles of big data specifically wing about, while providing units of 
largerumber of casesorstudents). analysis that can beincluded in models 
wil not work However by dentiving _toprediet student success. Custeting 
Which typesefinformation suchas student text provides a mzansto 
portfolio data are predictive of hhamess thes ften-overlooked data, 


student sucess, esearchersat this 


Institution can create anahytiemodels—- CQNS|IDERATIONS FOR 


that are accurate, using few variables. 


IMPLEMENTATION 
IRprofesionalsafenhave accessto ‘Texting has the posit af 
text data such as admissions e394, ‘being featured tool in analytics 
answerstoonlinetasts and e-portfola _projacts however instttional 
submissions which are aften nat Stakeholders need tbe intentional 
Incorporatedin data analysis because in how they lect and process these 
they ae toa dificult syesie datas they can use information 
compared with basiemetressuchas _inatimely manner: Processes for 
SATscore or GPA However clustering datacollection, distribution, project, 
terms fom these corpuses can belp implementation, and analysis of results 
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‘must be coordinated. any analytic 
project, timing Isimportant Text data 
‘need tobe leaned, processed and 
incorporated into predictive models in 
time for interventions to occur; because 
ofthe quik tumaround, most project 
ince automation, 


‘Another aspect to consider isthe data 
source o origin. The two examplesin 
this ail lusratecifferent sources 
of textdata. Inthe ist example, 
asking espendentsto submit one 
‘other leadership postion seems 
straightforward and eas to sot, 
howaver there can be ise with 
Classifying even these basic data For 
‘ecample, some responses do not easly 
‘ornaturalyfitintoa category, while 
‘others might be placed in more than 
‘one category. inthe secand example, 
researchers ware mining students 
Intvoducton statement for specie 
Information, but the prompts might 
hhave been too vague, ltemativay, 
‘prompts ean be too prescriptive 
Contrast the prompts, "What do you 
think ttakes to succeed in colege?” 
with Who wll ou askto mentor you 
0 you willbe successfulin callge” 
‘Simla to survey item design, the 
‘creation of prompts for text mining 
‘might develop intaits own science For 
Information regarding the intersection 
‘of tect mining and survey question 
design, see Georgeet al, 2012) 


‘Another aspect to consider when using 
‘text data for predictive models the 
use of seniment aalysis"sit nly 
Important know whata student 
iswetingoris also important to 
understand how a student feels about 
‘particular opie? Tere might be 

text sources, such a student course 
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evaluation osatlfacton surveys 
ereit would be Important ta 
consider sentiment. in these cases, 

the sentiment analysis companent of 
‘customer satisfaction software can 

be implemented 19 measure student 
attitude when ascribing certain 
Phenomena as postive, negative, or 
inciferent BM SPSS Modeler Premium 
hasa builtin brary for measuring 
customer satstaction allowing 

fora mare refined and adjustable 

ew of ndvdual thes and dies, 
Measuring sentiment might be crucial 
for stakeholders (eg, administrators 
and faculty) therefore fr reseatchers 
Implementing tert-mining projets, 
the analysis af sentiment might bean 
Important aspect when tyiog to gamer 
buyin to approve text mining projects 


ETHICAL 
CONSIDERATIONS 


‘Ase continue to evolve inthis era of 
analytes, itis not uncommon to hear 
‘concems about how data are being 
collectedand used Students might 
believe that their intellectual property 
oreven identity hasbeen exploited 
wine researchers use theirtext data, 
some af which can be very peecnal 
(Slade # Prins, 2013). Because there 
lemuch at stake interme of student 
responseto analytics institutions using 
these technologies need to develop 
sophisticated polices egarding the 
ata ownership, transparency, and 
security of data (Pada & Siemens, 
2014) Furthermore institutions are 
Sometimes i-equipped to grapple 
with the more complex issues around 
the use of data within analytes, such 
aewhat todo wien unexpected 
futcomes accu, such as predictive 


models that miscassfy student 
potential (Wills, 2014) Alof thse 
factors reflect a eampus culturein 
which administrators act ethically in 
using data to improve student succes, 
twensure that students and faculty re 
‘valued partners with technologies, and 
tacreate an environment that views 
‘these technologies as advancing those 
aims (Wil, Camel & Pistil 2073). 
‘These concems ver analytes ao 
relate to the use of text mining 


(One ofthe benefits of text mining is 
that itpresents abridge between the 
atheotetical process of data mining! 
analyticsand the more theoretically 
{doven approaches for research 

‘used in higher education ttcan be 
‘unnerving fr campus stakeholders, 
including faculty, adminisvaters, and 
students, to rely on theblack bax of 
analytics Text mining raises ethical 
‘concems about the results f analytes 
procestes What isthe implication 
‘when an algorithm predicts thata 
student sss ky to pass a couse or 
persist withina particular major? What 
aspects of students background 
and institutional environment are 
‘ether being overlooked ar considered 
in making these predictions? 
Furthermore institutional policy is 
trectionalybulton empirical studies 
‘of students that ference some 
‘theoretical framework ora leasta 
plausible hypothesis For example, 

frst. generation college students 

are disadvantaged, and therefore 
administrators argue that they should 
‘eceive ational resources as matter 
‘of policy. Gathering stakehelders ta 
suppor this one aspect of entity is 
easy becauseitis an understandable 
cancept "Students who did net grow 


upina home whereatlesst one 
parent completed colegeare at 
clsadvantage compared with students 
‘who i However, nan analytic 
project, st generation status might 
‘be one varable among many to predict 
student succes, These complex madele 
‘replace an understandable narrative 
such athe one related to first 
‘generation status wth narrative that 
ie telianton numerous factors. On the 
other hand, the resus of text mining 
«can be easly interpreted. In the second 
‘example ofthis study there was an 
inverse relationship betwen a student 
‘mentioning connection to family and 
‘eaming credit hours An educator 
‘meeting with students and reviewing 
‘© porfaliintreduetions would find i 
‘easy to act on this information. 


‘There ate other institutional 
calturalaspectsto consider when. 
Limplementing tt mining programs 
‘on campus: Students could eactto 

the common knawiedge that al of 
‘ther submissions onthe campus 
‘management system ae being mined. 
Students could change their esponses 
‘once they realize that theirseet isbeing 
‘mined Inthe second example of his 
study, t might be the case that upper 
‘ase students wil warm ineoming 
fust-years,"Make sure nat to mention 
‘connection to your family in your 

© porfaliaintroduetion: otherwise, 

youl haveto schedule an addtional 
‘meeting with youradvisor 


When considering equity institutional 
stakeholders need totake the 
Increasingly heterogeneous nature 
‘of student demography into 
consideration. Researchers need to 
account fr diverse level of familary 


with English resources, and prior 
education when waerkng with student 
text data Simply eying on text data 
without regard to student backoround 
variables can mislead. Students who 
have bette educational preparation or 
more resources may be advantaged in 
the types of phenomena they describe 
In text. ts, tect mining could be used 
to reinforce inequity within higher 
education. These aspects, though nat 
the focus of thisaticl ack whether 
the researcher should mine text, 

ata Researchers need ta consider 
that question at length before they 
consider whether they are able to 
minetext data 


PUTTING TEXT MINING 
TO USE 


“This article describes the ways 
Insehich text mining can be 
Implemented inthe work of thase 
Institutional researchers wih are 
asked 1 create analytics programs on 
their campuses. The article described 
‘wo promising application areas, and 
etaled two examples of text mining 
projects Important considerations 
forany text-mining projet are the 
context and timing of tee collection, 
Although text mining offers several 
promising avenues within higher 
‘education, ethical considerations 
should provoke deep questions 
about the appropriateness of these 
‘methods balanced withthe needs of 
the institution, 


Luke many projects within an shon, 
successful tet mining requires mltiple 
yeas of experts (Haight, 2014) 

Fst experts in data acquisition and 
‘managements nesded to procure 


and then clean data ea. to remove 
‘evancous text) while paring data with 
‘other sources of student information. 
‘Secand someone must mine and mal 
‘eta into predictive schemes that 
are statistically sound Third expertise 
ingraphic design ar data visualization 
‘speeded to succesfully communicate 
‘theconnection between datasource, 
sable tet, and student outcomes. 
Fourth, as described eater, the use 

‘of automation fe, creating systems 

te automaticaly calc, lean, and 
‘process datas paramount in any 
analytes process soa team member 
needs to bein charge ofthis aspect of 
‘the project Fall text-mining projects 
needa team champion to advertise 
and demonstrate the value of his 
‘technology and a pramate tio campus 
stakeholders. These sills are essential 
‘te integrating tect mining into campus 
‘decision-making Test mining presents 
amvcrocasm of ater R processes while 
‘offering an exciting way to make use of 
dense teat data walting to be unlocked, 
‘As afield Risin aunique postion 
‘because text data have been collected 
for decades and the complex decision 
making on calege campuses can only 
be further informed by the inclusion of 
thowe data 
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